Hardware Execution Driven Application Level Derating Calculation for Soft Error Rate Analysis

ABSTRACT

Mechanisms are provided for predicting effects of soft errors on an integrated circuit device design. A data processing system is configured to implement a unified derating tool that includes a machine derating front-end engine used to generate machine derating information, and an application derating front-end engine used to generate application derating information, for the integrated circuit device design. The machine derating front-end engine executes a simulation of the integrated circuit device design to generate the machine derating information. The application derating front-end engine executes an application workload on existing hardware similar in architecture to the integrated circuit device design and injects a fault into the existing hardware during execution of the application workload to generate application derating information. The machine derating information is combined with the application derating information to generate at least one soft error rate value for the integrated circuit device design.

BACKGROUND

The present application relates generally to an improved data processingapparatus and method and more specifically to mechanisms for collectingdata related to the masking (or derating) characteristics of a givenworkload when executed on a microprocessor based computing system, underan environment where transient errors may affect the correctness ofindividual data storage bits during the computation process.

As technological trends head toward smaller devices and wire dimensions,system design is entering an era of increased chip integration, reducedsupply voltages, and higher frequencies. An inescapable consequence ofthis development is the fact that transient/soft errors will continue tobe a serious threat to the general technology of robust computing.Transient errors may occur due to a variety of events, most notableamong them being the impact of high energy cosmic particles, alphaparticle effects due to the presence of lead in packaging materials, andinductive noise effects (Ldi/dt) on the chip supply voltage resultingfrom aggressive forms of dynamic power management.

Current soft error rate (SER) projections for Static Random AccessMemory (SRAM) cells, latch elements, and logic elements, as technologyscales from 65 nm towards 45 nm and beyond, indicate that the SER perbit for SRAM cells appears to be leveling off. However, it must be notedthat the bit count per chip is increasing exponentially, per Moore'sLaw. Latch SER is catching up with SRAM per-bit rates with a steeperslope of increase. Logic SER is projected to increase at a much fasterpace, although the absolute numbers are significantly smaller than SRAMor latch numbers at the present time. For Silicon On Insulator (SOI)technology, going forward from 65 nm to 45 nm technology, the latch SERper bit is predicted to increase 2× to 5× fold, and latches per chip areof course expected to increase with integration density. Again, storagecell SER will still dominate and latch errors will also be of increasingrelevance at 45 nm technologies and beyond.

SUMMARY

In one illustrative embodiment, a method, in a data processing system,is provided for predicting effects of soft errors on an integratedcircuit device design. The method comprises configuring the dataprocessing system to implement a unified derating tool. The unifiedderating tool comprises a machine derating front-end engine used togenerate machine derating information for the integrated circuit devicedesign, and an application derating front-end engine used to generateapplication derating information for the integrated circuit devicedesign. The method further comprises executing in the data processingsystem, by the unified derating tool, the machine derating front-endengine on a simulation of the integrated circuit device design togenerate the machine derating information. Moreover, the methodcomprises executing in the data processing system, by the unifiedderating tool, the application derating front-end engine to execute anapplication workload on existing hardware similar in architecture to theintegrated circuit device design and inject a fault into the existinghardware during execution of the application workload on the existinghardware to generate application derating information. In addition, themethod comprises combining, by the data processing system, the machinederating information with the application derating information togenerate at least one soft error rate (SER) value for the integratedcircuit device design. The SER value may be measured in a standard unitof failures in time (FIT).

In other illustrative embodiments, a computer program product comprisinga computer useable or readable medium having a computer readable programis provided. The computer readable program, when executed on a computingdevice, causes the computing device to perform various ones of, andcombinations of, the operations outlined above with regard to the methodillustrative embodiment.

In yet another illustrative embodiment, a system/apparatus is provided.The system/apparatus may comprise one or more processors and a memorycoupled to the one or more processors. The memory may compriseinstructions which, when executed by the one or more processors, causethe one or more processors to perform various ones of, and combinationsof, the operations outlined above with regard to the method illustrativeembodiment.

These and other features and advantages of the present invention will bedescribed in, or will become apparent to those of ordinary skill in theart in view of, the following detailed description of the exampleembodiments of the present invention.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The invention, as well as a preferred mode of use and further objectivesand advantages thereof, will best be understood by reference to thefollowing detailed description of illustrative embodiments when read inconjunction with the accompanying drawings, wherein:

FIG. 1 is an example diagram that illustrates the relationship betweenmachine derating and application derating;

FIG. 2 is an example block diagram of the primary operational elementsof a derating tool in accordance with one illustrative embodiment;

FIG. 3 presents a detailed overview of a framework of one illustrativeembodiment of the phased multi-stage prediction modeling, evaluation,and estimation framework of soft error vulnerability at a machinemicroarchitecture level through the various stages of system design;

FIG. 4 is a flowchart representation of a process of estimating thefailure in time and rating of a chip in accordance with one illustrativeembodiment;

FIG. 5 is a flowchart representation illustrating two approaches forestimating the micro-architectural workload residency of a chip inaccordance with one illustrative embodiment;

FIG. 6 is a flowchart representation illustrating an in-depthmicro-architectural residency gathering of an IC chip through asimulator in accordance with one illustrative embodiment;

FIG. 7 is a flowchart outlining an example operation for performingmachine derating and application derating projections in accordance withone illustrative embodiment;

FIG. 8 is a flowchart outlining an example operation for performingmachine derating in accordance with one illustrative embodiment;

FIG. 9 is a flowchart outlining an example operation for performingresidency analysis for a basic block in accordance with one illustrativeembodiment;

FIG. 10 is a flowchart outlining an example operation for performingapplication derating in accordance with one illustrative embodiment; and

FIG. 11 is an exemplary block diagram of a conventional dual threadedprocessor in which aspects of the present invention may be implementedin accordance with an illustrative embodiment.

DETAILED DESCRIPTION

Because of the predicted increase in Soft Error Rates (SERs), it isimportant to have accurate estimates of failure rates during the designphase of integrated circuit devices. One way to estimate failure ratesduring the design phase of an integrated circuit device is to performstatistical fault injection (SFI) into a register transfer level (RTL)simulator. SFI involves injecting faults into a simulation by causingone or more registers in the integrated circuit design to flip itsstate, i.e. flip a bit. The particular register(s) selected may berandomly selected, preselected, i.e. directed faults, or a combinationof random and preselected fault injections. One example for performingSFI operations with an RTL simulator is described in Ramachandran etal., “Statistical Fault Injection,” Proceedings of the InternationalConference on Dependable Systems and Networks (DSN), June 2008.

Using SFI allows the simulation to determine machine derating values fora given application run. The term “de-rating” or “de-rate” refers to theportion of time an integrated circuit design or structure is not in use,or during which it is operating but in a manner that cannot affect anexecuting workload's correctness. Therefore, it can be said that thestructure or design is not susceptible to soft errors during that timeperiod. This is termed de-rating because it reduces the overallopportunity for soft error vulnerability in a design or structure from abaseline or raw SER value (derived from the underlying unit/structurehardware primitives, which does not take the specific implementationusage into account).

Machine derating values are values indicative of a relative amount oferrors that are masked, recovered, or otherwise not detectable in theoutput of the integrated circuit due to the underlyingmicro-architecture operation. The SFI mechanism simulates themicro-architecture with an injected fault and the resulting output ofthe simulation indicates the states of the various circuit elements as aresult of operating on workloads provided to the simulation in thepresence of the injected fault. For example, one can determine that aninjected fault is propagated from one circuit element to another andcauses certain circuit elements to have an incorrect state, whereasother circuit elements are not affected by the injected fault. In othercases, an injected fault may vanish, be recovered, or otherwise bemasked through operation of the micro-architecture and will notnegatively affect the operation of the micro-architecture. The relativemeasure of these different possibilities is referred to herein as themachine derating value or factor. For example, a machine derating valuemay be a percentage of the number of faults that are masked by themicro-architecture, e.g., 90% masking machine derating factor.

While these SFI mechanisms allow one to obtain machine derating values,SFI mechanisms do not provide information regarding the applicationlevel derating component. The application level derating (AD) values arevalues indicative of a relative amount of faults or errors at the levelof the application-visible state that are masked, recovered, orotherwise not detectable in the output of an application, i.e.application level derating deals with state corruption visible at theapplication level while machine derating deals with state corruptionvisible at the hardware, or micro-architecture, level. That is, faultsinjected into the underlying micro-architectural state (e.g., pipelineor staging latches, intermediate tables, buffers and queues, etc.), inparticular, soft errors or transient errors induced by cosmic radiationand the like, may not be masked at the micro-architectural state level,but may be masked at the application level. For example, a register(i.e. a latch bank) having an injected fault present in themicro-architecture may not actually be used by the application and thus,would be masked at the application level. As another example, softwarerunning on the hardware may actually read or make use of an erroneous(or corrupted) data value but the error may still not affect theintegrity of the final calculated values that are generated during theexecution of the application. As a result, the injected fault, softerror, transient error, or the like, is masked by the software executingon the hardware.

The total derating value for the system is a combination of both themachine derating and application derating values. The total deratingvalue is a measure of how susceptible the system is to soft errors ortransient errors. FIG. 1 is an example diagram that illustrates therelationship between machine derating and application derating. As shownin FIG. 1, an injected fault N_(IF) represents a soft error or transienterror that is introduced into a system, such as due to radiation or theimpact of alpha particles on the circuitry of the system. This injectedfault essentially causes one or more bits in one or more registers ofthe system to be flipped so that the bit value is the opposite of theintended bit value. Such soft errors or transient errors are simulatedby way of the fault injection process. N_(IF) refers to the number ofrandomly generated fault injection experiments that may be carried out,in order to collect statistics that would yield the probability ofcausing a “non-masked” incorrect architected state that is visible bythe application.

As shown in FIG. 1, the injected faults N_(IF) are processed by thesimulation 110 of the micro-architecture. This simulation of themicro-architecture 110 may result in the injected faults N_(IF)vanishing, being recovered or fixed, or a checkstop. The checkstopresults in a machine “crash”, where the processor loses all state and itmay have to be rebooted before the application execution can berestarted. In the first two cases (“vanished” and “recovered”), theinjected error is fully masked by the micro-architecture, in that theapplication-visible state is not corrupted. For the “checkstop” case,there is a system failure or “crash” but the application state neverexhibits an incorrect state. That is, the “checkstop” is a situationwhere the error was detected by the hardware before any corruption ofthe architected state, but there was no recovery possible, so thecheckstop results in an exception that results in the application oreven the processor itself to “crash” prematurely.

The first two outcomes, i.e. vanished and recovered, in the processingof the injected fault during the simulation of the micro-architectureresult in the injected fault being truly masked at themicro-architecture level. In other words, even though the faultoccurred, it did not negatively affect the integrity of the machineoperation in any way. For example, a data bit that is flipped in aregion of the chip that is never used in the actual computation, willeffectively “vanish.” Similarly, certain errors (e.g., those in the bitsheld within the branch history table) might cause a spurious(unintended) branch mis-prediction, but the processor would eventuallyrecover once the branch recovered and hence, there would be nocorruption of the architected program state. Some relevant errors arerecoverable, in that they are detected by built-in error detectors inthe design and either corrected in-place (via the use oferror-correction codes) or through specially designed instruction retrymechanisms. Such errors do not affect architected state correctnesseither, and are categorized as “Recovered.” There is a class of errorsthat is categorized as “detected but are not recoverable.” These are thetypes of errors that get escalated into a machine “check” or “checkstop”which results in the “crashing” of the system, requiring a reboot or arestart of the application. In this third case (“checkstop”) there is avisible failure, but no incorrect architected state is generated orpropagated.

In some instances, the injected faults N_(IF) may not be masked at themicro-architecture level and thus, may affect the way in whichapplication workloads are executed on the micro-architecture. Thus, asshown in FIG. 1, the unmasked faults, i.e. the incorrect architectedstate of the hardware, may cause errors to manifest in the operation ofthe application, e.g., register values being incorrect or the like. Insome cases, the errors that were not masked at the micro-architecturelevel will not affect the application execution, e.g., in cases where anapplication does not utilize the registers in which the error ismanifest in the micro-architecture, and thus, the faults or errors aremasked at the application level simulation 120. In other cases, theerrors at the micro-architecture level simulation 110 cause errors to beintroduced into the application level simulation 120. Such applicationlevel errors may be software detected errors, silent data corruption, orthe like.

It can be seen from FIG. 1 that if one were not to consider the effectsof application level error masking/non-masking, a full picture of thesusceptibility of the system to soft errors and transient errors is notachieved. In some instances, application level errormasking/non-masking, i.e. application derating, may account for asignificant reduction in the amount of a failures in time (FITs)estimate. Not taking into account the application derating leads to aFITs estimate that is too conservative leading to over-design and higherpower consumption designs. Thus, it is important to gain anunderstanding of the effects of soft errors or transient errors, asmodeled by fault injection, at both micro-architecture level (machinederating) and the application level (application derating) to obtain anunderstanding of the susceptibility of a system to such soft errors ortransient errors.

Since both machine derating and application derating values areimportant to the evaluation of the system as a whole, it is important tohave tools available to evaluate both at a pre-silicon stage, i.e. priorto fabrication of the integrated circuit device, so that a reliabilityof the integrated circuit device design can be determined prior tofabrication. However, there are few solutions for determiningapplication level derating (AD) values, let alone doing so incombination with determining machine derating (MD) values.

One solution for deriving AD values is to use a functional (full-system)simulator to analyze application sensitivity to transient errors. Thatis, a fault may be injected into a micro-architecture design simulatorto thereby simulate a soft error or transient error in themicro-architecture. The running of an application on themicro-architecture design may then be simulated by simulating the entiresystem. The effects of the injected fault may then be observed in theoutput of the simulation to derive an application derating value. Thissolution is very slow and does not allow a broad set of workloads to becharacterized because of the practical issue of simulation timeinvolved.

As such, a new solution that is able to derive MD and AD values at, orclose to, hardware execution speed would be a valuable addition tocurrent pre-silicon reliability projection methodologies. Theillustrative embodiments provide an integrated framework to yieldmachine derating and application derating values for a targetapplication on a target integrated circuit device at an early-stage chipand system design phase, i.e. pre-silicon stage using existing hardwareto approximate a new system design. The illustrative embodimentsleverage hardware profiling tools to perform fault injection and controlprocesses within a hardware device to facilitate calculation of machineand application derating factors. In this way, the illustrativeembodiments provide a single unified tool to deduce individualcontributions, e.g., MD and AD and attendant sub-components, separately.

The mechanisms of the illustrative embodiments, in one phase ofoperation, utilizes a profiler and residency analyzer to obtain machinederating information for an integrated circuit device design. Theillustrative embodiments further provide, in a second phase ofoperation, an application fault injector that is used to inject faultsinto an existing hardware device. The existing hardware device executesan application workload and results of the execution are provided tobackend mechanisms. The backend mechanisms are utilized to collectapplication profile information and residency information from thehardware device to thereby approximate the machine derating for theintegrated circuit device design being evaluated. The backend mechanismsfurther collect state information from the existing hardware toapproximate the application derating information for the integratedcircuit device design being evaluated. The machine derating informationand the application derating information are then combined by thebackend mechanism to generate one or more soft error rate (SER)failures/faults in time (FIT) projections for the integrated circuitdevice design. These SER FIT values may be logged or otherwise stored ina data file for later use, output to human user for review, or otherwiseoutput to a system for use in identifying areas of the integratedcircuit device design that are susceptible to soft error faults andpossibly in need of a redesign to reduce their susceptibility to softerrors.

The net system-level SER FITs value(s) that is/are determined in thismanner may be inverted to project the integrate circuit device design'smean time to failure (MTTF). If the projected MTTF value is too small(i.e. below the target quality level or a pre-determined target MTTFvalue of the intended product) then the integrated circuit device designmay be modified or reworked to make sure that the SER FITs value isreduced. This may be accomplished, for example, by selective “hardening”of latches within a unit that has a particularly high SER FITs value, orby adding additional error detection and recovery features into thedesign. The modified design is then re-evaluated by the pre-silicon SERanalysis mechanisms of the illustrative embodiments. This iterativeprocess is repeated until the projected MTTF meets the targeted productquality, while satisfying other metrics like power consumption limits,performance, cost, and the like.

As will be appreciated by one skilled in the art, aspects of the presentinvention may be embodied as a system, method, or computer programproduct. Accordingly, aspects of the present invention may take the formof an entirely hardware embodiment, an entirely software embodiment(including firmware, resident software, micro-code, etc.) or anembodiment combining software and hardware aspects that may allgenerally be referred to herein as a “circuit,” “module” or “system.”Furthermore, aspects of the present invention may take the form of acomputer program product embodied in any one or more computer readablemedium(s) having computer usable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CDROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, in abaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Computer code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, radio frequency (RF), etc., or anysuitable combination thereof.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java™, Smalltalk™, C++, or the like, and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer, or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to the illustrativeembodiments of the invention. It will be understood that each block ofthe flowchart illustrations and/or block diagrams, and combinations ofblocks in the flowchart illustrations and/or block diagrams, can beimplemented by computer program instructions. These computer programinstructions may be provided to a processor of a general purposecomputer, special purpose computer, or other programmable dataprocessing apparatus to produce a machine, such that the instructions,which execute via the processor of the computer or other programmabledata processing apparatus, create means for implementing thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions thatimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus, or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

FIG. 2 is an example block diagram of the primary operational elementsof a derating tool in accordance with one illustrative embodiment. Asshown in FIG. 2, the derating tool 200 comprises a machine deratingfront-end engine 210 and an application derating front-end engine 220.The derating tool 200, and thus, the front-end engines 210 and 220, maybe implemented, for example, as software instructions stored in one ormore computer readable storage devices, e.g., memories, hard diskdrives, flash memory drivers, solid state disks, CD-ROMs, DVDs, etc.,that store the instructions for later retrieval and execution. Theseinstructions may be retrieved from the one or more computer readablestorage devices and then executed by one or data processing devices,e.g., processors or the like. However, it should be appreciated that apurely hardware embodiment or embodiments comprising a combination ofhardware implemented algorithms and software implemented algorithms maybe used without departing from the spirit and scope of the illustrativeembodiments.

The machine derating front-end engine 210 comprises a profiler engine212 and a residency analyzer 214. The machine derating front-end engine210 receives as input, an application workload 240 as well as the newintegrated circuit device design parameter file 230. The applicationworkload 240 is a set of instructions to be executed on the newintegrated circuit device having the design as specified in the newintegrated circuit device design parameter file 230. The execution ofthe application workload 240 in this new integrated circuit devicedesign is simulated by the machine derating front-end engine 210. Themachine derating front-end engine 210 models soft errors at the machineor micro-architecture level using the information about the integratedcircuit device design from the parameter file 230 and information aboutthe application workload 240. That is, a simulation is executed by themachine derating front-end engine 210 which generates a machine ormicro-architecture soft error rate (SER) faults in time (FIT) projectionfor the new integrated circuit device design. This simulation utilizesthe profiler 212 to profile the application workload 240 to obtaininformation about the types and numbers of each type, of instructionsexecuted as part of the application workload 240. The simulation furtherutilizes a residency analyzer 214 to determine residency statistics forelements of the new integrated circuit device design specified in theparameter file 230, e.g. a functional unit-wise latch profile 270. Theresidency statistics basically identify how much of the execution timeof the application workload 240 is each functional unit in the designbusy. The simulation performed by the machine derating front-end engine20 may be executed on the same existing hardware 250 as the applicationderating front-end engine 220 operates or on a different data processingdevice. More details regarding one illustrative embodiment forimplementing the machine derating front-end engine 210 will be providedhereafter with reference to FIGS. 3-6.

In addition to the machine derating performed by the machine deratingfront-end engine 210, the derating tool 200 provides an applicationderating front-end engine 220 which injects faults into an existinghardware 250. The existing hardware 250 may be an existing integratedcircuit device that is sufficiently similar to the new integratedcircuit device design that monitoring the existing hardware's operationwith regard to the application workload 240, in view of injected faults,provides a good approximation of the operation how the new integratedcircuit device design would operate under the same conditions with thesame application workload 240. For example, in one illustrativeembodiment, the integrated circuit device is a processor chip and theexisting hardware 250 may be an existing processor chip in a same familyof processor chips as the new processor chip design specified by theparameter file 230, e.g., an IBM P6 processor chip may be the existinghardware 250 while the parameter file 230 may specify the design of anIBM P7 processor chip that is in the same family of processor chips asthe P6 processor chip, where a same “family” essentially means that themicro-architectures of the chips are sufficiently similar and theinstruction set architectures are sufficient similar that they may beregarded as being related to one another. The existing hardware 250utilizes a same or significantly similar instruction set architecture(ISA) as the new integrated circuit device design specified in theparameter file 230 so that the execution of the application workload 240on the existing hardware 250 accurately approximates the execution ofthe same application workload 240 on the new integrated circuit devicedesign 230.

With the application derating front-end engine 220, the applicationworkload 240 is not simulated but actually executed on the existinghardware 250. The execution of this application workload 240 isinitiated but then stopped, or interrupted, by the architected statereading engine 222 early in the execution to read the architected stateof the application workload 240 in the existing hardware 250, e.g.,register states and the like.

Having read the architected state at the initialization of theapplication workload 240, i.e. prior to the application workload 240actually executing, the application derating front-end engine 220injects one or more faults into the existing hardware 250 by modifyingthe architected state of one or more elements of the existing hardware250 and writing the modified state back to the one or more elements ofthe existing hardware 250. The actual modification of the state of theselected element(s) is performed by an architected state modify andwrite engine 224 of the application derating front-end engine 220, forexample. The particular elements whose state is modified may bespecified in fault injection parameters 260 provided as input to theapplication derating front-end engine 220, i.e. a directed faultinjection, or may be randomly or pseudo-randomly determined by theapplication derating front-end engine 220 in any suitable manner.Moreover, a combination of directed and random or pseudo-randomselection of elements of the hardware 250 may be utilized as well. Forexample, the application derating front-end engine 220 may randomlyselect one or more registers in the hardware 250 to have their bitstates flipped, i.e. a state that was a 0 is flipped to be a 1 or viceversa.

Once the fault(s) is/are injected into the hardware 250 in this way, theapplication workload 240 execution on the hardware 250 is restarted. Atsome time later, e.g., at a specified time interval, at conclusion ofthe execution of the application workload 240, and/or the like, thestate of the hardware elements is read and stored in the storage 226.The state information obtained after injection of the fault isindicative of how the fault affects the execution of the applicationworkload 240 and whether the injected fault is masked or not masked bythe execution of the application. Thus, multiple sets of stateinformation may be gathered during the execution of the applicationworkload 240 so as to determine how the injected fault affects theoperation of the hardware 250 with regard to the execution of theapplication workload 240.

It should be noted that during each application derating operationperformed by the application derating front-end engine 220, a pluralityof fault injections may be performed. Each fault injection may be a samefault injection, i.e. a same element whose state is modified to simulatea fault, or may be a different fault injection, a different elementwhose state is modified to simulate a different fault. Thus, forexample, based on a particular time interval specified in the faultinjection parameters 260, the execution of the application workload 240on the hardware 250 may be interrupted periodically, the state of thehardware 250 may be read out by the architected state reading engine 222and stored in the storage 226, and may then be modified and written bythe architected state modify and write engine 224. This may simulate,for example, multiple impacts of particles, the effects of radiation,and the like, on the hardware 250 that may cause soft errors ortransient errors to occur in the hardware 250.

The particular parameters for performing the fault injection arespecified in the fault injection parameters input 260 that is providedto the application derating front-end engine 220. For example, theseparameters may specify one or more elements, e.g., registers or thelike, in the hardware 250 into which the fault is to be injected, thatthe fault injection is to be performed on one or more randomly orpseudo-randomly selected elements, a particular manner or algorithm forperforming the selection, a time interval for injecting the fault(s)(the time interval may be fixed or variable), and other fault injectionparameters that may be used to direct the way in which fault injectionis performed.

The machine derating and application derating performed by the deratingtool 200 may be performed multiple times to obtain statistical measures,e.g., probabilities, percentages, etc., with regard to the effects ofsoft errors and transient errors at the machine and application levels,e.g., statistical measures of masking/non-masking of softerrors/transient errors. This information is output to the backendengine 290 as the machine derating information 270 and applicationderating information 275. For example, the machine derating front-endengine 210 generates, via the hardware 250, or other data processingdevice (not shown) machine derating information 270 and functionalunit-wise latch profile information 280 which are both input to thebackend engine 290 to generate a machine derating soft error rate (SER)faults in time (FIT) projection component for the new integrated circuitdevice design. The application derating front-end engine 220 injectsfaults into the hardware 250 and determines the masking/non-masking ofthese faults at the application level to generate application deratinginformation 275 which is input to the backend engine 290 to generate anapplication derating SER FIT projection component. These two componentsare combined to obtain a total SER FIT projection 295 for the newintegrated circuit device design. For example, the SER FIT projection295 may be a worst case SER determined based on a product of the numberof latches and the per-latch FIT rate as determined from the machinederating and application derating information 270-275.

As mentioned above, the illustrative embodiments provide a unified tool200 for performing both machine derating analysis and applicationderating analysis using existing hardware 250 at hardware speeds. Thatis, the application derating, which might otherwise be performed usingsoftware based simulation, is performed in actual existing hardware athardware speeds. Furthermore, with the mechanisms of the illustrativeembodiments both components of SER FITs projections are taken intoconsideration, i.e. machine derating and application derating, therebyproviding a more comprehensive projection of the actual operation of thenew integrated circuit device design.

As mentioned above, one component of the derating tool 200 is themachine derating front-end engine 210 which performs a machine levelsimulation of the integrated circuit device design defined by theparameter file 230. There may be many different ways to implement such amachine derating front-end engine 210, any of which are intended to bewithin the spirit and scope of the illustrative embodiments. FIGS. 3-6are provided hereafter as an example of some illustrative embodiments inwhich a “phaser” mechanism is used to implement the machine deratingfront-end engine 210. An example of such a “phaser” front-end isdescribed in commonly owned and co-pending U.S. patent application Ser.No. 12/243,427, which is hereby incorporated by reference.

As described in the co-pending U.S. patent application Ser. No.12/243,427, the tool for modeling system level effects of soft errors atthe machine level, i.e. the machine derating front-end engine 210,integrates device-level and component-level soft error rate (SER)analysis with micro-architecture level performance analysis tools duringthe early phase of integrated circuit device design, e.g., the conceptphase of the design. For purposes of the description of the illustrativeembodiments, it will be assumed that the integrated circuit device beingdesigned is an integrated circuit (IC) chip. The integration ofdevice-level and component-level SER analysis withmicroarchitecture-level performance analysis tools at the concept phaseallows designers to study key performance-power-reliability trade-offs.In particular, besides the modeling tool of the illustrative embodimentsprojecting SER derating factors and corresponding SER FIT values for theIC chip and its various components, the modeling tool framework alsoallows the designers to undertake “what-if” evaluations and comparisons,with a focus on adopting various latches and cells from a design libraryin the various units based on their respective SER vulnerabilitycharacteristics. The modeling tool framework further allows anarchitecture definition team to decide on the exact style and level ofmicro-architectural redundancy that may be needed to achieve per-chipSER FIT targets.

In later stages of the design, as the design reaches the registertransfer level (RTL) mode, the IC chip SER profile is refined, as moreaccurate information about the unit-wise latch distributions, latchtypes, and SER vulnerabilities of logic and latch elements becomeavailable. Though in these later design phases major micro-architectureparadigm changes are generally not made, the analysis derived from themodeling tool framework of the illustrative embodiments aids inadjusting the relative protection levels and latch-types acrosshighlighted units of the IC chip design.

FIG. 3 presents a detailed overview of a framework 300 of oneillustrative embodiment of the phased multi-stage prediction modeling,evaluation, and estimation framework of soft error vulnerability at amachine microarchitecture level through the various stages of systemdesign. This framework 300 may be utilized as a machine deratingfront-end engine 210 of a derating tool 200 as shown in FIG. 2, forexample. The main methodology behind each of the phases 310, 320, 330,and 340 in the framework 300 are similar with the additional impetusthat as the design moves from the pre-concept phase 310 to the nexthigher phases 320, 330, and 340, respectively, the information availableand its accuracy improves, making it possible for the modeling accuracyto keep improving. In particular, the projections and predictions madein the previous phase can be used in a loop-back manner to improve thenext phase projections and predictions, and vice versa.

As illustrated in the framework 300 in FIG. 3, there are four basicmodels of the integrated circuit device under design, e.g., an IC chipfor purposes of this description, around which the soft error rate (SER)modeling is focused at the various design phases. These particularmodels are: (a) the M0 model 312 at the pre-concept phase 310; (b) theM1 model 322 at the concept phase 320; (c) the M2 model 332 during thehigh-level design (HLD) phase 330; and (d) the M3 model 342 during theregister transfer level (RTL) implementation phase 340. The four modelsM0, M1, M2, and M3, may be provided as parameter files, such asparameter file 230 in FIG. 2, for example. These four models, M0, M1,M2, and M3 have corresponding IC chip SER workload modeling engines:Phaser/M0 314, Phaser/M1 324, Phaser/M2 334, and Phaser/M3 344, wherethe latch/cell types and counts, with their correspondingvulnerabilities, are combined together with the active workloadresidencies of the chip or component to produce respective increasinglyaccurate SER projections 318, 328, 338, 348, and 352. At each of thePhaser/Mi components, with i=0, 1, 2, 3, a lateral iteration 316, 326,336, and 346 of improving the raw SER modeling and workload residencyaids the accuracy of the SER projection at that stage.

Workload residency is a measure of the opportune proportion of cyclesduring a workload execution for which bit-flip events could alter theprogram correctness. In effect, workload residency is a measure of alogic element's (average) susceptibility to soft errors; measuring thecycles during which the logic element is working on correct-pathinstruction execution which could affect the correctness of the workloadoutput versus the total number of cycles of execution. The residency canbe measured for logic elements, storage elements, etc. and may bemeasured at various levels of granularity from single transistorsthrough larger accumulations (e.g. logical units, etc.). Residency issimilar to the familiar metric of utilization, but with the additionalrestriction that only the utilized cycles in which the data stored orlogic being computed could result in alterations of the final workloadoutput (i.e. a soft error) are considered.

The M0 model 312 is an analytical performance model, e.g., a spreadsheetor the like, or a very early “cycle-approximate” simulator that isadapted from an earlier generation cycle-accurate M1 performance model322. As the design definition progresses to the concept phase 320, thearchitecture team arrives at a more definite view of the processor coreand chip-level micro-architecture. At this stage, the framework 300leverages the M1 (cycle-accurate performance) model 322 for the core tobuild the SER analysis tool. Later, during the HDL phase 330, the M1performance model 322 is replaced by a “latch-accurate” M2 model 332,where the inter-unit interfaces are accurately modeled in terms of theexact latch counts. The intra-unit execution semantics are still writtenin a behavioral format, e.g., using C/C++ type languages, as in the M1model 322, for example. During the HDL phase 330 of the design, thecorresponding chip SER workload modeling paradigm, Phaser/M2 334, isable to model the inter-unit error propagation effects more accuratelysince those interface latches and their switching activities aredirectly observable during the simulation of specific workloads.

During the RTL implementation phase 340, the framework SER analysismoves over to link up with the RTL (M3) model 342 which containsdetailed logic, latch, and timing information for the full processor. Inaddition to VHDL cycle-accurate software simulation (which is ratherslow), the framework 300 has the facility for using significantlyaccelerated AWAN hardware simulation 350 of the RTL model 342, whichallows for running full benchmarks if necessary, at the RTL detaillevel. AWAN hardware simulation 350 is described in Ludden et al.“Functional Verification of the POWER4 Microprocessor and POWER4Multiprocessor Systems,” IBM Journal of Research & Development, Vol. 46,No. 1, pages 53-76, 2002, which is hereby incorporated by reference. Atthe RTL implementation phase 340 of the design, as the RTL approachesfull functionality, the framework 300 can also leveragevalidation/calibration support from statistical fault injection (SFI)approaches.

The details of the Phaser/Mi components of the framework 300, where i=0,1, 2, 3, will now be described. Since the various phases of theframework 300 are not fundamentally different when considered from ahigh level, the methodology of the framework 300 will be illustratedthrough an in-depth generalized discussion of Phaser/M1. It is assumedthat the design has a cycle-accurate M1 microarchitecture simulator andpreliminary design VHDL code available, with clear knowledge and choicesof the various technology elements, e.g., latches, combinational logic,and memory cells, along with the technology parameters that govern theirbehavior.

As Li et al. has shown in “Architecture-Level Soft Error Analysis:Examining the Limits of Common Assumptions,” Proceedings of the 37^(th)Annual IEEE/IFIP International Conference on Dependable Systems andNetworks, Edinburgh, U.K., 2007, pages 266-275, which is herebyincorporated by reference, for practical ranges of the native per-bitSERs observable at sea level, and for modeled systems with tens ofcomponents (units), a simpler, post-processing approach that uses aninstrumented simulator to collect average workload residency statisticsper workload run, can yield sufficiently accurate per-unit and totalsystem SER. Such an approach is based on two steps. In a first step, aper-unit average architectural vulnerability factor (AVF), as describedin Mukherjee et al., “A Systematic Methodology to Compute theArchitectural Vulnerability Factors for a High-PerformanceMicroprocessor,” Proceedings of the 36^(th) Annual IEEE/ACMInternational Symposium on Microarchitecture,” San Diego, Calif., 2003,pages 29-40, which is hereby incorporated by reference, is estimated.The AVF is then multiplied by the unit maximum (unmasked or raw) SER toproject the real, i.e. derated, SER of the unit as actually manifestedin program behavior. In a second step, unit-level error rates are addedto derive the chip-level SER value, which is referred to generally asthe sum of failure rates (SOFRs). However, it is important to stressthat the accuracy of the unit-wise and total manifested failure ratesdepends on how the AVFs are collected.

In the Phaser/Mi components of the illustrative embodiments, apost-processing approach is used to collect all the required averageresidency statistics at the end of a workload run. Those statistics arethen combined with the detailed information of per-unit latchdistributions of specific types and protection levels, along with nativetechnological data related to raw per-bit SER values. Thus, theillustrative embodiments use new data (workload residency, etc.) andmetrics (raw FITS, etc.) to obtain a new measure of the IC design (i.e.susceptibility to SER) than has been previously known in post processingmechanisms. The AVF/SOFR approach estimates the SER of an IC chip orsystem in two steps. The first step, i.e. the AVF step, estimates theSER of the individual components under the basic assumption that theprobability of failure is uniform across a program execution. Hence, theSER of a given component in a processor chip is simply the fraction oftime it holds useful work and/or data multiplied by the raw SER of thecomponent. This fraction of time is referred to as the component valueof data residency. The second step, i.e. the SOFR step, estimates theSER of the entire IC chip or system by adding together the individualSER values of the constituent components under the general assumptionthat the inter-arrival time for failures is exponentially distributed.

Depending on the particular design phase, the methodology derives theresidency factors from the corresponding simulation model, e.g., M0 312,M1 322, M2 332, or M3 342 in FIG. 3. In one illustrative embodiment, theframework 300 uses a detailed method to calculate the architecturalresidency factors that can be implemented more practically than knownmethodologies while preserving accuracy. For example, a systematicmethod to monitor only useful register file residencies, i.e., thosethat contribute to actual instruction completion and modification of thearchitected register and memory state, is utilized. The term“architected” refers to the components of the machine that areaccessible by software including memory, register files, special-purposeregisters, and the like. The measured residency data is combined withthe various latch, logic, and cell raw SERs of the targeted IC chip in asystematic manner to project the de-rating factors as well as theoverall SER.

Referring now to FIG. 4, a flowchart representation of a process ofestimating the failure in time and rating of a chip in accordance withone illustrative embodiment is depicted. The process outlined in FIG. 4may be performed, for example, by the Phaser/Mi in FIG. 3, for example.A quick inspection of the process 400 demonstrates that a considerableamount of data and a number of factors from various sources are factoredand merged together to achieve a realistic SER and de-rating predictionmodel. Generally, the framework 300 revolves around two majorapproaches: (1) estimating the raw SER of the targeted IC chip orsystem; and (2) deriving the average residency of the typical workloadexecuting on the same IC chip or system. These two derivations are usedto accurately predict the de-rating factor and the projected SER of theIC chip or system under study.

The raw SER of an IC chip is defined as the expected total SER assumingthat the chip is busy 100% of the time and that every bit or cell upsetthat occurs during its operation leads to a manifested soft error.Accurate raw SER modeling of an IC chip or its components utilizes anin-depth knowledge of the constituent latches, array cells, andcombinational logic with respect to counts and types as well as theirassociated vulnerabilities to soft errors. As illustrated by the process400 in FIG. 4, the framework 300 of the illustrative embodiments gathersthis element and technology information 420 of the logic 421, latch 422,and SRAM 423 counts, as well as type data, from a design database 410,which may be maintained, for example, in a storage device associatedwith the data processing system. In addition, information on the ratioof latches 424 and memory cells 425 protected against errors, capable ofrecovering from errors, as well as those that will only signal theoccurrence of errors, are also taken from the design database 410. Sincethe framework 300 is multi-phase, it is expected that this data isupdated frequently as more accurate design data becomes available withthe maturity of the design.

The Phaser/Mi component modeling computes the contribution of SER byeach of the elements, e.g., logic, latches, SRAM, and the like. As shownin the process 400 in FIG. 4, the raw SER contribution of combinationallogic 430 is basically the summation of all gates multiplied by therespective gate's native raw FIT. Essentially, the raw SER/FIT valuesare simple measures of the susceptibility of the basic circuit/devicestructures, given the technological parameters of the device (i.e. sizesof gate, oxide thickness, voltage level, doping, etc.) to be affected byvarious quanta of “noise” (i.e. energy that is not specifically intendeddata value). In the development of a digital circuit in a specifictechnology (i.e. 45 nm CMOS process) there is a technology team thatdefines a set of basic elements (the device library, which includes bitlatches, basic gates, mux elements, etc.) that are used to implement thedesired design. This design library team also runs various experiments,including electrical simulations, etc. to determine the rawsusceptibility of the devices to these noise effects (of which anexternal strike by a galactic particle, e.g., an alpha particle, etc. isone possible source). Devices that are run closer to the thresholdvoltage, or with narrower gate channels, etc. are more susceptible, andthe designers have the option to choose among a variety of differentdevices with different SER/FIT characteristics (among others, e.g.,speed, power, etc.).

With regard to the mechanisms of the illustrative embodiments, thegeneration of raw SER/FIT information is a precursor to the operation ofthe mechanisms of the illustrative embodiments. This information isgenerally kept in the form of design database (i.e. as attributes in thedesign library, or externally in a database, etc.) and may or may notchange occasionally as the technology becomes better characterized, oras new experiments on the basic circuit data provide better informationabout the raw FIT of devices. This data is available from eachtechnology foundry/fabrication facility, for each design librarysupported. The illustrative embodiments take in this raw SER/FITinformation and operate upon it in the manner described herein.

The raw SER contribution of latches 440 is the summation of all latchbits multiplied by the respective latch bit's native raw FIT. The rawSER contribution of SRAM or storage structures 450 is the summation ofall memory/storage cells multiplied by the respective cell's native rawFIT. The summation of these values 430, 440, and 450 give the totalworst case Raw FITs 480 of the IC chip or system under study.

However, it should be noted that without even taking a workload runningon the IC chip or system into account, there are still elements which donot, or cannot, contribute to the raw SER of the IC chip. The basicreasoning is that some of the elements 420 in the IC chip are protectedagainst soft errors, such as through error correcting codes (ECC) orparity checking, and can either recover from an occurrence of a softerror or be able to signal such occurrence for the necessary mitigatingprocesses to proceed. In addition, there is empirical establishedknowledge that due to logic-level masking effects in a combinationallogic chain or cone, soft error upset events on gates in levels 4 andbeyond generally do not contribute to manifested errors in the receivinglatch bank in real microprocessor pipelined logic paths. Hence, thetotal worst case Raw FITs discussed above can be further de-rated ormasked based on the recognition of elements within the elements 420 thatdo not, or cannot, contribute to the raw SER of the IC chip or system.

As previously mentioned above, the term “de-rating” or “de-rate” refersto the portion of time an IC chip unit or structure (whether logic orstorage) is not in use, or during which it is operating but in a mannerthat cannot affect an executing workload's correctness. Therefore, itcan be said that the structure or unit is not susceptible to soft errorsduring that time period. This is termed de-rating because it reduces theoverall opportunity for soft error vulnerability in a unit or structurefrom a baseline or raw SER value (derived from the underlyingunit/structure hardware primitives, which does not take the specificimplementation usage into account). For example, a microprocessor unitwith a de-rating factor of 75% over a given workload execution impliesthat such a unit is susceptible to errors only 25% of the totalexecution time of the workload run, which factor is used to reduce thebaseline per-cycle susceptibility by that proportion. De-rating factorsresult from a wide set of behaviors, from low-level electrical factors(e.g., the latch duty cycle) through high-level effects (e.g., theinstruction set architecture and programming conventions). Overall,de-rating refers to any of the factors that reduce the raw SER from thebase value.

To further de-rate or mask the total worst case Raw FITs, the designplanned protected latch ratio 424 and protected cell ratio 425 data,gathered from the design database 410, as well as information on thegate levels in the various logic chains, are used. In particular, thecombinational logic chain SER 430 is de-rated further by knowing theactual levels and composition counts of logic gates and being able toextract the proportion of the zero to three gate type levels and countsthat matter. For the levels and counts of the combinational logic gateswithin the logic chains, such data is gathered by applying a VHDLdissecting tool, e.g., a Vtiming tool 461, on the evolving registertransfer level (RTL) model of the IC chip or system. An example of aVtiming tool 461 that may be utilized with the mechanisms of theillustrative embodiments is described in Kudva et al., “EarlyPerformance Prediction,” Proceedings of the Workshop onComplexity—Effective Design: Held in Conjunction with the 32ndInternational Symposium on Computer Architecture, Madison, Wis., 2005(see www.csl.cornell.edu/albonesi/wced05/wced05.pdf), which is herebyincorporated by reference.

The Vtiming tool 461, which may be part of the profiler 212 in FIG. 2,for example, scans the VHDL description of the IC chip or component andgathers statistics about the number and types of logic gates within eachlevel of a given combinational logic chain. The Vtiming tool 461 is ableto provide an estimate of the length and number of gates in variouslevels of logic of a cone of logic in the early stage RTL descriptionwithout the requirement of a gate level design or synthesis. Based onthe number of signals that are in the input set of an output or statesignal, an estimate of the number and sensitivity of combinational logicin the design can be made by the Vtiming tool 461. Using thisinformation, the raw FITs contributed by the combinational logic 430 arefurther de-rated or masked 460 by considering only the gates in thelevel zero to level three of the design. The de-rated combinationallogic raw FITs 460 are then combined with the latch and SRAM raw FITs440 and 450, both de-rated by the protected latch ratio 424 andprotected cell ratio 450, respectively. The resulting intermediate FITs470 is considered purely micro-architecture dependent, before theeffects of workload behavior are considered.

This far the raw SER of an IC chip or system are modeled assuming thatall bit flips are of consequence. However, it is known that for thetypical micro-architecture or IC chip/system, the workload residency ofuseful data is well less than 100% across all modeled units within theIC chip or system. Hence, to better estimate the de-rating factor orproject the manifested FIT, statistics on the residency of relevant livedata values within the IC chip or system are gathered. Such residencyvalues are collected as accurately as possible on the IC chip or systemunder study through its available simulator as it executes a typicalrepresentative workload.

The residency data or metrics collection process is illustrated in FIGS.5 and 6. With the workload residency data or metrics available 550, theintermediate FITs 470 value is micro-architecturally de-rated 485 to thereal (or manifested) FITs 490. A comparison between the de-rated (real)FITs 490 and the total worst case Raw FITs 480 generates the netde-rating factor estimate 492.

It should be noted that the methodology as described has been with afocus on silent data corruption (SDC)-related SER manifested at theprogram output. The description here has focused on predicting an earlystage (necessarily conservative) bound on the SDC-specific SER, orequivalently the machine, de-rating factor applicable to SDC-specificerror rate estimations. However, those skilled in the art would realizethat the methodology allows for breaking the micro-architecturalde-rating under study into various groups of de-rating: de-rating due tocorrected error class, de-rating due to checkstop error class, de-ratingdue to micro-architecturally vanished error class, de-rating due toincorrect architected state error class, and the like.

With reference now to FIG. 5, a flowchart representation illustratingtwo approaches for estimating the micro-architectural workload residencyof a chip in accordance with one illustrative embodiment is depicted.The operations outlined in FIG. 5 may be implemented, for example, inthe Mi models, where i=0, 1, 2, 3, e.g., models 312, 322, 332, and 342in FIG. 3, which a particular emphasis on 322 for purposes of thisdescription.

Structures in a microprocessor IC chip can be broadly classified intotwo major groups: logic and storage. Logic structures can be defined tobe the various data and control processing units on the IC chip that aremade up of combinational logic gates and latches. Typical examples ofon-chip logic structures then include the functional units, e.g.,fixed-point unit (FXU) pipelined logic datapath (with its associatedcontrol logic) and the instruction decode unit (IDU) logic. Storage isdefined to be the various structures that hold data values, such as thequeues, register files, and other SRAM macros, for example. Of course,latches may also serve as staging and data-hold resources, especiallyduring stalls in a pipeline flow. In this case, depending on how suchstalls are implemented in relation to the clock-gating functionalitywithin the pipeline, certain latch banks may also be categorized withinthe storage class. However, the residency modeling for such pipelinelatches is simpler than register files and arrays and is better treatedunder the logic category.

Workload residency modeling 500, as depicted in FIG. 5, which may beperformed, for example, by the residency analyzer 214 in FIG. 2 forexample, attempts to measure the opportune proportion of cycles during aworkload execution for which bit-flip events could alter programcorrectness. Hence, to accurately capture such residency data, the focusincludes only the true path of program execution. For example,dataflow-centric soft errors on a wrongly speculated path during programexecution cannot alter program output. Similarly, rejected or flushedexecutions, dead instructions, NOPs (no operations), andperformance-enhancing instructions (e.g., those related to datapre-fetch) do not contribute to SER-induced data corruption. In effect,for logic structures, the question that is asked is, during whatfraction of the cycles is there an operation that uses a particularlogic structure in such a manner as to lead to actual completedinstructions? For storage structures, the question is, during whatfraction of cycles is the storage resource holding a value that willsubsequently be used in the true execution path?

When it comes to SER modeling, there are often attempts to usemicroarchitecture utilization as a proxy for actual residency. However,a close examination of utilization and residency in a complexmicroprocessor pipeline shows a potentially significant differencebetween the two. When utilization is used, there are often correctivefactors that may be applied (e.g., use of average stall event or deadinstruction statistics) to approach a better residency average. However,in general, there are many more sources of de-rating imposed by themicro-architecture-workload pair. The effective correction factors tothe computed utilization data in proxy for residency (due to allsources) may be awkward and error-prone to derive individually viaaverage statistical behavior alone. Hence, with the framework 300 of theillustrative embodiments, the micro-architecture simulator is accuratelyinstrumented for gathering actual residency data as shown in FIG. 5.

As the process 500 in FIG. 5 portrays, there are two main ways wherebyoverall residency data can be gathered using the micro-architecture orIC chip simulator. One approach is to directly instrument 520 the ICchip or micro-architecture simulator (often written for performance datagathering) with the residency metrics data gathering instructions. Inthat case, when the simulator is fed with a workload trace 510, thenecessary residency data 550 is gathered. Another approach for gatheringresidency data such as when the source performance simulator is notavailable for instrumentation 530, is to resort to capturing a scrollpipe output 540 of a performance run and using an analysis module 545 togather the residency data from such an output. A scroll pipe output is arepresentation of the timing state of instructions or internaloperations moving through the microprocessor pipeline. It is,effectively, a visualization medium that (usually graphically)represents on a cycle-by-cycle basis the possible locations in apipeline of various internal data (i.e. instructions, cache access data,control signals) and, for each such location in use, what instruction,datum, etc. is using that location on each cycle. Given the capabilityto visualize the utilization of the pipeline, one can then try toestimate the residency using various means.

With reference now to FIG. 6, a flowchart representation illustrating anin-depth micro-architectural residency gathering of an IC chip through asimulator in accordance with one illustrative embodiment is depicted.The operation outlined in FIG. 6 may be an in-depth illustration of theoperations performed as part of elements 520 or 545 in FIG. 5, whichagain may be performed by the Mi models. The process 600 in FIG. 6illustrates an example of the detailed residency gathering operationswithin a performance simulator or a scroll pipe analyzer. Basically,each instruction records significant event times during simulatedexecution, and these per-instruction metrics are gathered when aninstruction completes. Starting at the first cycle of simulation 610,and for each subsequent cycle of the simulation, the analysis toolstarts at the first pipeline stage 615 and considers each pipeline stagein turn 655. For each pipeline stage, a determination is made as towhether an instruction is first entering that pipeline stage 620. Ifthere is an instruction entering that stage, the time of theinstruction's entry into the stage (i.e., the current cycle ofsimulation) is recorded in the instruction's internal data structure(e.g., a lop data structure) 625. A determination is then made as towhether an instruction is leaving the pipeline stage 630. If so, thenthe time of the instruction's exit from the stage (i.e., the currentcycle of simulation) is recorded in the instruction's internal datastructure (e.g., a lop data structure) 635.

A determination is made as to whether an instruction completes on thiscycle 640. If so, then the residency data contributed by thisinstruction is calculated and recorded in the global residency trackingdata-structures of the analysis tool 645. If there are more pipelinestages 650, then the analysis continues with the next pipeline stage655, otherwise the analysis moves on to the next cycle of the simulation660 wherein the scan of pipeline stages is begun anew for that cycle ofsimulation. In this way, the contribution of each instruction throughoutthe simulation can be recorded with the instruction, and is promoted tothe global residency statistics only when the instruction completes (andtherefore proves to have been from a final execution pass on the correctexecution path).

As can be seen from the foregoing, illustrative embodiments present aphased methodology framework that allows progressive refinement ofsoft-error related de-rating and failure in time analysis as the designprogresses from (pre)concept phase to high-level design phase and theRTL implementation phase. By the nature of the illustrative embodiments,the methodology renders itself to a pipelined evaluative framework thatallows quicker start of SER analysis for a next generationmicroprocessor IC chip design starting off from the present phase of acurrent design, enabling higher design/evaluation throughput in amulti-chip design process.

As discussed above, in addition to the machine derating performed usingthe mechanisms described above with regard to FIGS. 3-6 as one exampleimplementation of the machine derating front-end engine of theillustrative embodiments, the illustrative embodiments further utilizean application derating front-end engine 220 that determines SER FITprojections at the application level. This is done by injecting faultsinto an actual hardware device 250 that represents the new integratedcircuit device design and then monitoring and collecting informationfrom this hardware device 250 to determine SER FIT projections as thehardware device 250 executes the application workload.

In one illustrative embodiment, the injection of faults into thehardware device 250 utilizes a modified process trace (ptrace) tool.Ptrace is a UNIX operating system tool that is used to perform programdebugging. The ptrace tool allows one process to control anotherprocess, thereby enabling the controller process to inspect andmanipulate the internal state of the controlled process. A ptrace callallows the ptrace tool to be attached to another process and control theprocess to which it is attached including manipulation of its filedescriptors, memory, and registers. The ptrace tool can single-stepthrough the target process, i.e. the controlled process, code andobserve system calls and signals.

In one illustrative embodiment, the ptrace tool is attached to theapplication workload 240 and used to obtain the architectural stateinformation for the hardware device 250, e.g., the register states andmemory states associated with the application workload 240. The ptracetool allows one to look at the architected state, which means theunderlying processor's architected register state, as visible to theparticular application workload 240, as well as the state of the memoryallocated to that particular application workload 240. Existing ptracetools are used to read the architected state at user-specifiedbreakpoints in a program, in order to debug the execution of theprogram. The illustrative embodiments use of the ptrace tool to read thearchitected state, but then to also write back a modified state andallow continued execution so as to understand the behavior of theapplication under an injected error. Repeated such random injectionexperiments allows one to estimate the application derating componentfor transient error (e.g., SER) analysis. Thus, the illustrativeembodiments provide additional, new mechanisms built on top of the basicptrace tool in order to achieve an entirely new functional capabilitynot previously known with existing ptrace tools.

Thus, with the illustrative embodiments, the ptrace tool is used toobtain the architectural state information for the hardware device 250and this architectural state is then written out to a storage device.The application derating front-end engine 220 selects one or moreregister/memory elements to have their states modified to thereby injecta fault. The particular register/memory element(s) that are selected maybe selected using a directed selection operation where they arepredetermined or a random/pseudo-random selection operation.

The state(s) of the selected register/memory element(s) is/are modifiedto thereby inject a fault that simulates a soft error or transienterror. The actual modification of the state(s) may be performed by usingthe ptrace tool to view and read out the state, modify it and then writeit back to the register/memory word. Thereafter, the fault injector,e.g., the ptrace tool in this example, is detached from the process,e.g., the application workload 240 in this example, and the process ispermitted to continue execution.

At some time later, such as at a next interval at which the ptrace toolis again called, at the end of execution of the application workload240, or the like, the state of the various register/memory elements isagain read and stored. This later architectural state information may becompared against a “golden” architectural state that identifies what thearchitectural state of the hardware 250 should be when the applicationworkload 240 is executed properly on the hardware 250. Any discrepanciesbetween the “golden” architectural state and the architectural stateread from the hardware 250 after injection of the fault, may bedetermined to be due to the injected fault. That is, the execution of aprogram or application on a computer is a deterministic process, inthat, multiple executions (without any deliberately injected orspuriously induced error) would result in exactly the same sequence ofarchitected state transitions, (as visible to the program orapplication) culminating in the same final state at the end of programexecution. Therefore any mismatch against the error-free execution's“golden” architected state(s), at any point after the deliberate erroror fault injection, can be attributed to that injected error or fault.When there is a difference between the golden architectural state andthe actual detected architectural state, and this difference can betraced to the injected fault, then the difference is considered to be asoft error. If there is no difference between the golden architecturalstate and the detected architectural state, then it can be determinedthat the injected fault did not cause a soft error to occur at theapplication level.

Thus, in summary, using the mechanisms of the illustrative embodiments,in a single-error random injection methodology, over the executionperiod of the application, the fault injection operation may take theform of:

(1) a directed or random instant is chosen for error or fault injection;

(2) at that instant, the execution is frozen, and one of the elements(selected by way of a directed selection or a random/pseudo-randomselection) of the application-visible architected state (i.e.architected register or memory state) is read out;

(3) one of the data bits of the selected state element is flipped invalue and written back to the register or memory word;

(4) execution is then resumed and the effect of the injected error orfault is observed.

This fault injection, detection of resulting architectural state, andcomparison may be performed repeatedly. From this repeated operation, asoft error rate can be determined, i.e. a number of times an injectedfault causes a soft error to occur in the application workloadexecution. These soft error rates (SERs) may be combined, by the backendengine 290, with the SER values determined from the machine derating togenerate a total SER FIT projection for the integrated circuit devicedesign. That is, a large number of experiments (e.g., several thousand)may be done to collect statistics about “what happened” as a result ofthe error or fault injections. The probabilities of various effects maythen be calculated, e.g., the probability that the application executionis totally unaffected in terms of correctness and completions, theprobability that the application completes but with output data errors,the probability that the application terminates prematurely, or thelike.

For example, suppose 10,000 such single-error injections are done, toobserve the effect on the application, and suppose only the end ofexecution application state is examined—in cases where the applicationactually completes. Out of these experiments, suppose 2000 result insilent data corruption (SDC), 1000 result in premature termination—i.e.the execution is not able to complete, 200 result in the programentering a “hung” state—either an infinite loop, or stuck at the samepoint without further progress, and the remaining 6800 experimentsresult in no effect whatsoever—i.e. the program or application completeswithout any corruption of the final output or application-visiblearchitected state. Then, the probability of SDC failure caused by anarchitected state's single bit flip incident is computed as 2000/10000or 0.2. The architectural derating (masking) factor, with regard only toSDC is 8000/10,000 or 80%. Now suppose the micro-architecture (machine)derating factor is computed using the other part of the illustrativeembodiments as being 75%, which means that the probability of a randommicro-architectural state latch or buffer bit flip causing a corruptionof the application-visible architected state is 0.25 or 25%. Then, theprobability of a random latch flip causing an SDC failure at the programoutput is 0.25*0.2=0.05. From this probability, and a knowledge of theactual “raw” SER FITs for the processor/system, the actual, derated SERFITs (with regard to SDC failures) can be computed. Taking thereciprocal of the derated SER FITs gives us the mean time to failure orMTTF.

Thus, the illustrative embodiments provide a unified derating tool forperforming both machine derating and application derating analysis so asto determine a total SER FITs projection for an integrated circuitdevice design. The unified derating tool performs simulation of theintegrated circuit device design to determine a machine level deratingwhile performing an application derating operation using existinghardware that uses a same instruction set architecture and approximatesthe operation of the integrated circuit device design at hardwarespeeds. This speeds up the derating operation by eliminating the need toperform application derating using a full design simulation.

FIG. 7 is a flowchart outlining an example operation for performingmachine derating and application derating projections in accordance withone illustrative embodiment. The operation outlined in FIG. 7 may beimplemented, for example, by the derating tool 200 in FIG. 2, forexample. Thus, various operations set forth in FIG. 7 may be implementedin special purpose hardware, software executing on one or more dataprocessing devices or other general purpose hardware mechanisms, and/orany combination of special purpose hardware and software executing ongeneral purpose hardware mechanisms.

As shown in FIG. 7, the operation starts with attaching a ptrace tool orother fault injector to the application workload (step 710). The ptracetool is used attached initially to profile the execution characteristicsof the application or program in question as well as to be used inperforming application derating as described above and hereafter. Adetermination is made as to whether a machine derating operation is tobe performed (step 715). If so, then the application workload isprofiled to determine the composition of instructions in the applicationworkload (step 720). A residency analyzer is then used to determineresidency information during a simulation of the execution of theapplication workload on the integrated circuit device design (step 725).A determination is then made as to whether the application workload hasexited or not (step 730). If not, the operation returns to step 715. Ifthe application workload has exited, then the operation terminates.

Returning to step 715, if a machine derating operation is not to beperformed, then a determination is made as to whether an applicationderating operation is to be performed (step 735). If so, then thearchitected state of existing hardware upon which the applicationworkload is going to be executed is read and stored (step 740). One ormore faults are injected into the hardware (step 745) and the injectedfault states are written to the hardware elements (step 750). The faultinjector is then detached (step 755). Thereafter, a determination ismade as to whether the application workload has exited or not (step 730)and if so, then the operation terminates; otherwise the operationreturns to step 715. The operation outlined in FIG. 7 may be repeatedmultiple times to obtain a statistical representation of SER FITprojections for an integrated circuit device design.

The operation outlined in FIG. 7 assumes that both the MD and ADoperations are performed using a single host hardware processor andthus, a determination as to whether MD or AD operations are to beperformed is utilized. However, in other illustrative embodiments, suchas those in which multiple processors in a host system are utilized, theMD and AD operations may be performed in parallel, in which case such adecision or branching operation becomes unnecessary.

FIG. 7 illustrates the operation for performing the MD and AD testingoperations. These MD and AD testing operations result in MD and ADfactor data that may be converted to probability information for varioustypes of conditions, e.g., silent data corruption, hung states, nocorruption, etc. These probability values may then be used, along withraw SER FIT values, to generate derated SER FIT values. These deratedSER FIT values may then be used to direct modification of a hardwaredesign.

FIG. 8 is a flowchart outlining an example operation for performingmachine derating in accordance with one illustrative embodiment. Theoperation outlined in FIG. 8 may be implemented, for example, by themachine derating front-end engine 210 in FIG. 2.

As shown in FIG. 8, the operation starts with a ptrace tool or otherfault injector being attached to the application workload (step 810). Anext basic block of instructions in the application workload isdetected, where a “basic block” is a set of instructions in theapplication workload that terminates in a branch instruction (step 815).A determination is made as to whether this is a new basic block or abasic block that has already been analyzed for machine derating purposes(step 820). This determination may be made based on a data structure,such as a linked list or table data structure, that stores identifiersof currently found basic blocks. As basic blocks are found during theoperation, their identifiers are stored in the data structure and alookup operation may be performed using this data structure to determinewhether a basic block.

If the basic block is a new basic block that has not previously beenanalyzed for machine derating purposes, the basic block is profiled todetermine the instruction composition of the basic block, the profile isstored for later use, and a breakpoint is added to the basic block (step825). The breakpoint may be added in the basic block of the source codeso that the ptrace tool can be used to detect the end of the basic blockduring the dynamic analysis of the application or program execution.Residency analysis is then performed on the basic block using asimulation of the integrated circuit device design to generate residencystatistics (step 830).

If the basic block is not a new basic block, then a lookup of theprofile of the basic block is performed (step 835). The operation thengoes to step 830 where residency analysis is performed. A determinationis then made as to whether the application workload execution has exited(step 840). If not, the operation returns to step 815 where a next basicblock of instructions in the application workload is detected and theoperation is repeated for the next basic block. If the applicationworkload executed does exit, the operation terminates. It should benoted that the residency analysis performed as part of the machinederating operation is a primary operation for computing the MD factor.Once the MD and AD factors are computed, then the raw SER FITs can bederated to produce the final, derated SER FITs. The derated SER FITs arethen inverted (via a reciprocal function) to compute the mean time tofailure (MTTF), as previously explained. It should also be noted that,in the calculation of MD factors, using the mechanisms of theillustrative embodiments, repeated experiments are not needed, i.e. asingle analysis run is sufficient. The AD operation involves repeated,iterative experiments in order to draw statistically meaningfulconclusions about the AD factor.

FIG. 9 is a flowchart outlining an example operation for performingresidency analysis for a basic block in accordance with one illustrativeembodiment. The operation outlined in FIG. 9 may be implemented, forexample, by the residency analyzer 214 of the machine derating front-endengine 210 in FIG. 2.

As shown in FIG. 9, the operation starts by reading the basic block thatis being analyzed and the cycle count C is set to 0 (step 910). Thevalues for calculating residency statistics are initialized (step 915).For example, the instruction number i is initialized to 1, the totalnumber N(j) of instructions for a particular instruction type j isinitialized to 0, to type of instruction j is initialized to 1, and thevalue for the number of functional units types (or classes) M is set toan initial value based on the integrated circuit device design.

The ith instruction is then scanned (step 920) and a determination ismade as to the type of instruction, i.e. what functional unit type isused to execute the instruction, e.g., a floating point unit, branchexecution unit, fixed point execution unit, or the like (step 925). Acorresponding total number of instructions in the basic block of theparticular instruction type of the ith instruction is incremented (step930), e.g., if the ith instruction is of a functional unit type/class j,then the total instructions of the functional unit type/class j N(j) isincremented, i.e. N(j)=N(j)+1.

A determination is then made as to whether the total number ofinstructions of the particular functional unit type j is equal to anissue bandwidth limit for the functional unit type, i.e. FU-max(j) (step935). The issue bandwidth limit indicates a maximum number ofinstructions of a particular type that may be dispatched or issued perprocessor cycle. For example, if the integrated circuit device designcomprises two load/store units, a maximum of two load/store instructionsper cycle may be dispatched.

If the issue bandwidth limit for the functional unjit type is not equalto the total number of instructions of the particular functional unittype j, then a determination is made as to whether the value of i isequal to the basic block length BB-len (step 940). If the value of i isequal to the BB-len, i.e. instruction i is the last instruction in thebasic block, then residency statistics for the basic block arecalculated (step 945). For example, these residency statistics mayinclude cycles per instruction (CPI), instructions per cycle (IPC),floating point execution unit utilization, load/store unit utilization,fixed point execution unit utilization, and the like. For example, CPImay be calculated as the number of cycles C divided by the basic blocklength with IPC being the inverse of CPI. The various utilizationstatistics may be calculated as the number of instructions of thecorresponding functional unit type divided by the product of the basicblock length and the instructions per cycle. Other statistical measuresof utilization and residency may be calculated without departing fromthe spirit and scope of the illustrative embodiments.

If the value of i is not equal to the basic block length BB-len, thenthe value of i is iterated (step 950) and the operation returns to step920 where the next instruction is scanned. If the total number ofinstructions for the particular functional unit type is equal to theinstruction bandwidth for the functional unit type, i.e. FU-max(j) (step935), then the number of cycles is incremented, the value of i isincremented, and the values of N(j) and j are initialized (step 955).The operation then returns to step 920. It should be appreciated thatthe residency analysis operation outlined in FIG. 9 may be implementedrepeatedly for each basic block in the application workload.

FIG. 10 is a flowchart outlining an example operation for performingapplication derating in accordance with one illustrative embodiment. Theoperation shown in FIG. 10 may be implemented, for example, by theapplication derating front-end engine 220 in FIG. 2.

As shown in FIG. 10, the operation starts by attaching a ptrace tool orother fault injector to the application workload (step 1010). Faultinjection parameters are then received (step 1015) and a determinationis made as to whether the fault injection is a directed fault injection(step 1020). If so, then the state(s) of the integrated circuitelement(s) into which the fault is to be injected is/are acquired (step1025). If the fault injection is not a directed fault injection, thenone or more integrated circuit element(s) are selected either randomlyor pseudo-randomly and their corresponding state(s) are acquired (step1030). One or more bits of the selected integrated circuit element(s)are then flipped (step 1035) and the resulting state of the integratedcircuit element(s) is written back to the integrated circuit element(s)(step 1040). The ptrace tool or fault injector is then detached from theapplication workload (step 1045) and a determination is made as towhether additional fault injections should be performed (step 1050). Forexample, each application may need to be executed many times, e.g.,thousands of times, in order to draw statistically meaningfulconclusions about a particular fault injection experiment. Thus, thisiterative loop shows the same fault injection experiment being repeatedmany times, especially for random injection experiments. For directedinjection experiments, multiple iterative runs may not be needed if aparticular instant of injection and a particular data bit is somehow theonly object of interest.

If additional fault injections should be performed with the ptrace toolbeing re-attached to the application workload, the operation returns tostep 1010. If no more fault injections are to be performed, theoperation terminates. The results of this operation may be used togenerate an AD factor calculation that may be combined with the MDfactor calculation and raw SER FIT values to derive derated SER FITvalues which may be inverted to generate MTTF values. This informationmay then be used to direct hardware redesign efforts.

As with the other operations of the other flowcharts, this operation maybe repeated many times in order to obtain a statistical representationof the manifestation of injected faults in the results of the executionof the application workload from an application level.

The illustrative embodiments may be utilized in many different types ofdata processing environments. In order to provide a context for thedescription of the specific elements and functionality of theillustrative embodiments, FIG. 11 is an example environment in whichaspects of the illustrative embodiments may be implemented. It should beappreciated that FIG. 11 is only an example and is not intended toassert or imply any limitation with regard to the environments in whichaspects or embodiments of the present invention may be implemented. Manymodifications to the depicted environment may be made without departingfrom the spirit and scope of the present invention.

Referring to FIG. 11, an exemplary block diagram of a dual threadedprocessor design showing functional units and registers is depicted inaccordance with an illustrative embodiment. Processor 1100 may beimplemented as processing unit in a data processing system, for example,in these illustrative examples. Processor 1100 comprises a singleintegrated circuit superscalar microprocessor with dual-threadsimultaneous multi-threading (SMT) that may also be operated in a singlethreaded mode. Accordingly, as discussed further herein below, processor1100 includes various units, registers, buffers, memories, and othersections, all of which are formed by integrated circuitry. Also, in anillustrative embodiment, processor 1100 operates according to reducedinstruction set computer (RISC) techniques.

As shown in FIG. 11, instruction fetch unit (IFU) 1102 connects toinstruction cache 1104. Instruction cache 1104 holds instructions formultiple programs (threads) to be executed. Instruction cache 1104 alsohas an interface to level 2 (L2) cache/memory 1106. IFU 1102 requestsinstructions from instruction cache 1104 according to an instructionaddress, and passes instructions to instruction decode unit 1108. In anillustrative embodiment, IFU 1102 may request multiple instructions frominstruction cache 1104 for up to two threads at the same time.Instruction decode unit 1108 decodes multiple instructions for up to twothreads at the same time and passes decoded instructions to instructionsequencer unit (ISU) 1109.

Processor 1100 may also include issue queue 1110, which receives decodedinstructions from ISU 1109. Instructions are stored in the issue queue1110 while awaiting dispatch to the appropriate execution units. For anout-of order processor to operate in an in-order manner, ISU 1109 mayselectively issue instructions quickly using false dependencies betweeneach instruction. If the instruction does not produce data, such as in aread after write dependency, ISU 1109 may add an additional sourceoperand (also referred to as a consumer) per instruction to point to theprevious target instruction (also referred to as a producer). Issuequeue 1110, when issuing the producer, may then wakeup the consumer forissue. By introducing false dependencies, a chain of dependentinstructions may then be created, whereas the instructions may then beissued only in-order. ISU 1109 uses the added consumer for instructionscheduling purposes and the instructions, when executed, do not actuallyuse the data from the added dependency. Once ISU 1109 selectively addsany required false dependencies, then issue queue 1110 takes over andissues the instructions in order for each thread, and outputs or issuesinstructions for each thread to execution units 1112, 1114, 1116, 1118,1120, 1122, 1124, 1126, and 1128 of the processor. This process will bedescribed in more detail in the following description.

In an illustrative embodiment, the execution units of the processor mayinclude branch unit 1112, load/store units (LSUA) 1114 and (LSUB) 1116,fixed point execution units (FXUA) 1118 and (FXUB) 1120, floating pointexecution units (FPUA) 1122 and (FPUB) 1124, and vector multimediaextension units (VMXA) 1126 and (VMXB) 1128. Execution units 1112, 1114,1116, 1118, 1120, 1122, 1124, 1126, and 1128 are fully shared acrossboth threads, meaning that execution units 1112, 1114, 1116, 1118, 1120,1122, 1124, 1126, and 1128 may receive instructions from either or boththreads. The processor includes multiple register sets 1130, 1132, 1134,1136, 1138, 1140, 1142, 1144, and 1146, which may also be referred to asarchitected register files (ARFs).

An ARF is a file where completed data is stored once an instruction hascompleted execution. ARFs 1130, 1132, 1134, 1136, 1138, 1140, 1142,1144, and 1146 may store data separately for each of the two threads andby the type of instruction, namely general purpose registers (GPRs) 1130and 1132, floating point registers (FPRs) 1134 and 1136, special purposeregisters (SPRs) 1138 and 1140, and vector registers (VRs) 1144 and1146. Separately storing completed data by type and by thread assists inreducing processor contention while processing instructions.

The processor additionally includes a set of shared special purposeregisters (SPR) 1142 for holding program states, such as an instructionpointer, stack pointer, or processor status word, which may be used oninstructions from either or both threads. Execution units 1112, 1114,1116, 1118, 1120, 1122, 1124, 1126, and 1128 are connected to ARFs 1130,1132, 1134, 1136, 1138, 1140, 1142, 1144, and 1146 through simplifiedinternal bus structure 1149.

In order to execute a floating point instruction, FPUA 1122 and FPUB1124 retrieves register source operand information, which is input datarequired to execute an instruction, from FPRs 1134 and 1136, if theinstruction data required to execute the instruction is complete or ifthe data has passed the point of flushing in the pipeline. Complete datais data that has been generated by an execution unit once an instructionhas completed execution and is stored in an ARF, such as ARFs 1130,1132, 1134, 1136, 1138, 1140, 1142, 1144, and 1146. Incomplete data isdata that has been generated during instruction execution where theinstruction has not completed execution. FPUA 1122 and FPUB 1124 inputtheir data according to which thread each executing instruction belongsto. For example, FPUA 1122 inputs completed data to FPR 1134 and FPUB1124 inputs completed data to FPR 1136, because FPUA 1122, FPUB 1124,and FPRs 1134 and 1136 are thread specific.

During execution of an instruction, FPUA 1122 and FPUB 1124 output theirdestination register operand data, or instruction data generated duringexecution of the instruction, to FPRs 1134 and 1136 when the instructionhas passed the point of flushing in the pipeline. During execution of aninstruction, FXUA 1118, FXUB 1120, LSUA 1114, and LSUB 1116 output theirdestination register operand data, or instruction data generated duringexecution of the instruction, to GPRs 1130 and 1132 when the instructionhas passed the point of flushing in the pipeline. During execution of asubset of instructions, FXUA 1118, FXUB 1120, and branch unit 1112output their destination register operand data to SPRs 1138, 1140, and1142 when the instruction has passed the point of flushing in thepipeline. Program states, such as an instruction pointer, stack pointer,or processor status word, stored in SPRs 1138 and 1140 indicate threadpriority 1152 to ISU 1109. During execution of an instruction, VMXA 1126and VMXB 1128 output their destination register operand data to VRs 1144and 1146 when the instruction has passed the point of flushing in thepipeline.

Data cache 1150 may also have associated with it a non-cacheable unit(not shown) which accepts data from the processor and writes it directlyto level 2 cache/memory 1106. In this way, the non-cacheable unitbypasses the coherency protocols required for storage to cache.

In response to the instructions input from instruction cache 1104 anddecoded by instruction decode unit 1108, ISU 1109 selectively dispatchesthe instructions to issue queue 1110 and then onto execution units 1112,1114, 1116, 1118, 1120, 1122, 1124, 1126, and 1128 with regard toinstruction type and thread. In turn, execution units 1112, 1114, 1116,1118, 1120, 1122, 1124, 1126, and 1128 execute one or more instructionsof a particular class or type of instructions. For example, FXUA 1118and FXUB 1120 execute fixed point mathematical operations on registersource operands, such as addition, subtraction, ANDing, ORing andXORing. FPUA 1122 and FPUB 1124 execute floating point mathematicaloperations on register source operands, such as floating pointmultiplication and division. LSUA 1114 and LSUB 1116 execute load andstore instructions, which move operand data between data cache 1150 andARFs 1130, 1132, 1134, and 1136. VMXA 1126 and VMXB 1128 execute singleinstruction operations that include multiple data. Branch unit 1112executes branch instructions which conditionally alter the flow ofexecution through a program by modifying the instruction address used byIFU 1102 to request instructions from instruction cache 1104.

Instruction completion unit 1154 monitors internal bus structure 1149 todetermine when instructions executing in execution units 1112, 1114,1116, 1118, 1120, 1122, 1124, 1126, and 1128 are finished writing theiroperand results to ARFs 1130, 1132, 1134, 1136, 1138, 1140, 1142, 1144,and 1146. Instructions executed by branch unit 1112, FXUA 1118, FXUB1120, LSUA 1114, and LSUB 1116 require the same number of cycles toexecute, while instructions executed by FPUA 1122, FPUB 1124, VMXA 1126,and VMXB 1128 require a variable, and a larger number of cycles toexecute. Therefore, instructions that are grouped together and startexecuting at the same time do not necessarily finish executing at thesame time. “Completion” of an instruction means that the instruction isfinishing executing in one of execution units 1112, 1114, 1116, 1118,1120, 1122, 1124, 1126, or 1128, has passed the point of flushing, andall older instructions have already been updated in the architectedstate, since instructions have to be completed in order. Hence, theinstruction is now ready to complete and update the architected state,which means updating the final state of the data as the instruction hasbeen completed. The architected state can only be updated in order, thatis, instructions have to be completed in order and the completed datahas to be updated as each instruction completes.

Instruction completion unit 1154 monitors for the completion ofinstructions, and sends control information 1156 to ISU 1109 to notifyISU 1109 that more groups of instructions can be dispatched to executionunits 1112, 1114, 1116, 1118, 1120, 1122, 1124, 1126, and 1128. ISU 1109sends dispatch signal 1158, which serves as a throttle to bring moreinstructions down the pipeline to the dispatch unit, to IFU 1102 andinstruction decode unit 1108 to indicate that it is ready to receivemore decoded instructions. While processor 1100 provides one detaileddescription of a single integrated circuit superscalar microprocessorwith dual-thread simultaneous multi-threading (SMT) that may also beoperated in a single threaded mode, the illustrative embodiments are notlimited to such microprocessors. That is, the illustrative embodimentsmay be implemented in any type of processor using a pipeline technology.

The method as described above is used in the fabrication of integratedcircuit chips. The resulting integrated circuit chips can be distributedby the fabricator in raw wafer form (that is, as a single wafer that hasmultiple unpackaged chips), as a bare die, or in a packaged form. In thelatter case the chip is mounted in a single chip package (such as aplastic carrier, with leads that are affixed to a motherboard or otherhigher level carrier) or in a multichip package (such as a ceramiccarrier that has either or both surface interconnections or buriedinterconnections). In any case the chip is then integrated with otherchips, discrete circuit elements, and/or other signal processing devicesas part of either (a) an intermediate product, such as a motherboard, or(b) an end product. The end product can be any product that includesintegrated circuit chips, ranging from toys and other low-endapplications to advanced computer products having a display, a keyboardor other input device, and a central processor. Moreover, the endproducts in which the integrated circuit chips may be provided mayinclude game machines, game consoles, hand-held computing devices,personal digital assistants, communication devices, such as wirelesstelephones and the like, laptop computing devices, desktop computingdevices, server computing devices, or any other computing device.

As noted above, it should be appreciated that the illustrativeembodiments may take the form of an entirely hardware embodiment, anentirely software embodiment or an embodiment containing both hardwareand software elements. In one example embodiment, the mechanisms of theillustrative embodiments are implemented in software or program code,which includes but is not limited to firmware, resident software,microcode, etc.

A data processing system suitable for storing and/or executing programcode will include at least one processor coupled directly or indirectlyto memory elements through a system bus. The memory elements can includelocal memory employed during actual execution of the program code, bulkstorage, and cache memories which provide temporary storage of at leastsome program code in order to reduce the number of times code must beretrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards,displays, pointing devices, etc.) can be coupled to the system eitherdirectly or through intervening I/O controllers. Network adapters mayalso be coupled to the system to enable the data processing system tobecome coupled to other data processing systems or remote printers orstorage devices through intervening private or public networks. Modems,cable modems and Ethernet cards are just a few of the currentlyavailable types of network adapters.

The description of the present invention has been presented for purposesof illustration and description, and is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the art. Theembodiment was chosen and described in order to best explain theprinciples of the invention, the practical application, and to enableothers of ordinary skill in the art to understand the invention forvarious embodiments with various modifications as are suited to theparticular use contemplated.

What is claimed is:
 1. A method, in a data processing system, forpredicting effects of soft errors on an integrated circuit devicedesign, comprising: configuring the data processing system to implementa unified derating tool, wherein the unified derating tool comprises amachine derating front-end engine used to generate machine deratinginformation for the integrated circuit device design, and an applicationderating front-end engine used to generate application deratinginformation for the integrated circuit device design; executing in thedata processing system, by the unified derating tool, the machinederating front-end engine on a simulation of the integrated circuitdevice design to generate the machine derating information; executing inthe data processing system, by the unified derating tool, theapplication derating front-end engine to execute an application workloadon existing hardware similar in architecture to the integrated circuitdevice design and inject a fault into the existing hardware duringexecution of the application workload on the existing hardware togenerate application derating information; and combining, by the dataprocessing system, the machine derating information with the applicationderating information to generate at least one soft error rate (SER)value for the integrated circuit device design.
 2. The method of claim1, wherein the machine derating information comprises one or moremachine derating factors identifying a relative measure of faultsmasked, recovered, or otherwise not detectable in an output of theintegrated circuit device design, and wherein the application deratinginformation comprises one or more application derating factorsidentifying a relative measure, at a level of an application-visiblestate, of faults masked, recovered, or otherwise not detectable in anoutput of the application.
 3. The method of claim 1, wherein executingthe machine derating front-end engine on a simulation of the integratedcircuit device design comprises using an integrated circuit devicedesign parameter file, an application workload, a profiler engine, and aresidency analyzer to obtain machine derating information for theintegrated circuit device design, wherein the profiler engine obtainsinformation about the execution of the application workload on asimulation of the integrated circuit device design as defined by theintegrated circuit device design parameter file, and wherein theresidency analyzer generates residency statistics for elements of theintegrated circuit device design, wherein the residency statisticsidentify how much execution time of the execution of the applicationworkload is attributable to each of the elements.
 4. The method of claim1, wherein executing the application derating front-end engine comprisesusing an application fault injector to inject the fault into theexisting hardware during execution of the application workload by:stopping execution of the application workload on the existing hardware;reading a current architected state of the existing hardware; selectingone or more elements of the existing hardware into which the fault is tobe injected; modifying a state of the selected one or more elements ofthe existing hardware to have a different state representative of aninjected fault; writing the modified state back to the selected one ormore elements of the existing hardware; and resuming execution of theapplication workload on the existing hardware.
 5. The method of claim 4,wherein the application fault injector operates based on fault injectionparameters input to the application fault injector, wherein the faultinjection parameters specify at least a timing of the fault injectionperformed by the application fault injector and whether the selection ofthe one or more elements is directed or random.
 6. The method of claim4, wherein the application derating front-end engine executes theapplication workload on the existing hardware a plurality of times in aninterative process and in each iteration causes the application faultinjector to inject a fault into the existing hardware during executionof the application workload, and wherein the application deratinginformation is a statistical representation of the results of eachiteration of execution of the application workload on the existinghardware.
 7. The method of claim 4, wherein the application faultinjector is a process trace (ptrace) tool used for debugging of anapplication that is modified to inject the fault.
 8. The method of claim1, wherein combining the machine derating information with theapplication derating information to generate at least one SER value forthe integrated circuit device design comprises calculating the at leastone SER value based on a raw SER value for the integrated circuit devicedesign and a product of the machine derating information and theapplication derating information.
 9. The method of claim 1, furthercomprising: determining, by the data processing system, at least onemean time to failure (MTTF) value based on the at least one soft errorrate (SER) value for the integrated circuit device design; comparing theMTTF value to a target MTTF value to determine if the MTTF value issmaller than the target MTTF value; and in response to the MTTF valuebeing smaller than the target MTTF value, generating an indication of aneed to modify the integrated circuit device design to increase the MTTFvalue.
 10. The method of claim 1, wherein the execution of the machinederating front-end engine and the execution of the application deratingfront-end engine are performed in parallel at substantially a same time.11. A computer program product comprising a computer readable storagemedium having a computer readable program stored therein, wherein thecomputer readable program, when executed on a computing device, causesthe computing device to: implement a unified derating tool, wherein theunified derating tool comprises a machine derating front-end engine usedto generate machine derating information for the integrated circuitdevice design, and an application derating front-end engine used togenerate application derating information for the integrated circuitdevice design; execute, by the unified derating tool, the machinederating front-end engine on a simulation of the integrated circuitdevice design to generate the machine derating information; execute, bythe unified derating tool, the application derating front-end engine toexecute an application workload on existing hardware similar inarchitecture to the integrated circuit device design and inject a faultinto the existing hardware during execution of the application workloadon the existing hardware to generate application derating information;and combine the machine derating information with the applicationderating information to generate at least one soft error rate (SER)value for the integrated circuit device design.
 12. The computer programproduct of claim 11, wherein the machine derating information comprisesone or more machine derating factors identifying a relative measure offaults masked, recovered, or otherwise not detectable in an output ofthe integrated circuit device design, and wherein the applicationderating information comprises one or more application derating factorsidentifying a relative measure, at a level of an application-visiblestate, of faults masked, recovered, or otherwise not detectable in anoutput of the application.
 13. The computer program product of claim 11,wherein the computer readable instructions cause the computing device toexecute the machine derating front-end engine on a simulation of theintegrated circuit device design by using an integrated circuit devicedesign parameter file, an application workload, a profiler engine, and aresidency analyzer to obtain machine derating information for theintegrated circuit device design, wherein the profiler engine obtainsinformation about the execution of the application workload on asimulation of the integrated circuit device design as defined by theintegrated circuit device design parameter file, and wherein theresidency analyzer generates residency statistics for elements of theintegrated circuit device design, wherein the residency statisticsidentify how much execution time of the execution of the applicationworkload is attributable to each of the elements.
 14. The computerprogram product of claim 11, wherein the computer readable instructionscause the computing device to execute the application derating front-endengine by using an application fault injector to inject the fault intothe existing hardware during execution of the application workload by:stopping execution of the application workload on the existing hardware;reading a current architected state of the existing hardware; selectingone or more elements of the existing hardware into which the fault is tobe injected; modifying a state of the selected one or more elements ofthe existing hardware to have a different state representative of aninjected fault; writing the modified state back to the selected one ormore elements of the existing hardware; and resuming execution of theapplication workload on the existing hardware.
 15. The computer programproduct of claim 14, wherein the application fault injector operatesbased on fault injection parameters input to the application faultinjector, wherein the fault injection parameters specify at least atiming of the fault injection performed by the application faultinjector and whether the selection of the one or more elements isdirected or random.
 16. The computer program product of claim 14,wherein the application derating front-end engine executes theapplication workload on the existing hardware a plurality of times in aninterative process and in each iteration causes the application faultinjector to inject a fault into the existing hardware during executionof the application workload, and wherein the application deratinginformation is a statistical representation of the results of eachiteration of execution of the application workload on the existinghardware.
 17. The computer program product of claim 14, wherein theapplication fault injector is a process trace (ptrace) tool used fordebugging of an application that is modified to inject the fault. 18.The computer program product of claim 11, wherein combining the machinederating information with the application derating information togenerate at least one SER value for the integrated circuit device designcomprises calculating the at least one SER value based on a raw SERvalue for the integrated circuit device design and a product of themachine derating information and the application derating information.19. The computer program product of claim 11, wherein the computerreadable instructions further causes the computing device to: determineat least one mean time to failure (MTTF) value based on the at least onesoft error rate (SER) value for the integrated circuit device design;compare the MTTF value to a target MTTF value to determine if the MTTFvalue is smaller than the target MTTF value; and in response to the MTTFvalue being smaller than the target MTTF value, generate an indicationof a need to modify the integrated circuit device design to increase theMTTF value.
 20. The computer program product of claim 11, wherein theexecution of the machine derating front-end engine and the execution ofthe application derating front-end engine are performed in parallel atsubstantially a same time.
 21. An apparatus, comprising: a processor;and a memory coupled to the processor, wherein the memory comprisesinstructions which, when executed by the processor, cause the processorto: implement a unified derating tool, wherein the unified derating toolcomprises a machine derating front-end engine used to generate machinederating information for the integrated circuit device design, and anapplication derating front-end engine used to generate applicationderating information for the integrated circuit device design; execute,by the unified derating tool, the machine derating front-end engine on asimulation of the integrated circuit device design to generate themachine derating information; execute, by the unified derating tool, theapplication derating front-end engine to execute an application workloadon existing hardware similar in architecture to the integrated circuitdevice design and inject a fault into the existing hardware duringexecution of the application workload on the existing hardware togenerate application derating information; and combine the machinederating information with the application derating information togenerate at least one soft error rate (SER) value for the integratedcircuit device design.
 22. The apparatus of claim 21, wherein theinstructions cause the processor to execute the application deratingfront-end engine by using an application fault injector to inject thefault into the existing hardware during execution of the applicationworkload by: stopping execution of the application workload on theexisting hardware; reading a current architected state of the existinghardware; selecting one or more elements of the existing hardware intowhich the fault is to be injected; modifying a state of the selected oneor more elements of the existing hardware to have a different staterepresentative of an injected fault; writing the modified state back tothe selected one or more elements of the existing hardware; and resumingexecution of the application workload on the existing hardware.
 23. Theapparatus of claim 22, wherein the application fault injector operatesbased on fault injection parameters input to the application faultinjector, wherein the fault injection parameters specify at least atiming of the fault injection performed by the application faultinjector and whether the selection of the one or more elements isdirected or random.
 24. The apparatus of claim 22, wherein theapplication derating front-end engine executes the application workloadon the existing hardware a plurality of times in an interative processand in each iteration causes the application fault injector to inject afault into the existing hardware during execution of the applicationworkload, and wherein the application derating information is astatistical representation of the results of each iteration of executionof the application workload on the existing hardware.
 25. The apparatusof claim 22, wherein the application fault injector is a process trace(ptrace) tool used for debugging of an application that is modified toinject the fault.